augmented example
Boosting Automatic Exercise Evaluation Through Musculoskeletal Simulation-Based IMU Data Augmentation
Spilz, Andreas, Oppel, Heiko, Munz, Michael
Automated evaluation of movement quality holds significant potential for enhancing physiotherapeutic treatments and sports training by providing objective, real-time feedback. However, the effectiveness of deep learning models in assessing movements captured by inertial measurement units (IMUs) is often hampered by limited data availability, class imbalance, and label ambiguity. In this work, we present a novel data augmentation method that generates realistic IMU data using musculoskeletal simulations integrated with systematic modifications of movement trajectories. Crucially, our approach ensures biomechanical plausibility and allows for automatic, reliable labeling by combining inverse kinematic parameters with a knowledge-based evaluation strategy. Extensive evaluations demonstrate that augmented variants closely resembles real-world data, significantly improving the classification accuracy and generalization capability of neural network models. Additionally, we highlight the benefits of augmented data for patient-specific fine-tuning scenarios, particularly when only limited subject-specific training examples are available. Our findings underline the practicality and efficacy of this augmentation method in overcoming common challenges faced by deep learning applications in physiotherapeutic exercise evaluation.
- North America > United States > South Carolina > Anderson County > Anderson (0.04)
- Europe > Germany (0.04)
- Research Report > Strength High (1.00)
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.66)
- Health & Medicine > Consumer Health (0.46)
- Health & Medicine > Therapeutic Area > Neurology (0.46)
- Health & Medicine > Therapeutic Area > Musculoskeletal (0.46)
Reprint: a randomized extrapolation based on principal components for data augmentation
Li, Le, Wei, Jiale, Peng, Pai, Chen, Qiyuan, Guedj, Benjamin, Cai, Bo
Data scarcity and data imbalance have attracted a lot of attention in many fields. Data augmentation, explored as an effective approach to tackle them, can improve the robustness and efficiency of classification models by generating new samples. This paper presents REPRINT, a simple and effective hidden-space data augmentation method for imbalanced data classification. Given hidden-space representations of samples in each class, REPRINT extrapolates, in a randomized fashion, augmented examples for target class by using subspaces spanned by principal components to summarize distribution structure of both source and target class. Consequently, the examples generated would diversify the target while maintaining the original geometry of target distribution. Besides, this method involves a label refinement component which allows to synthesize new soft labels for augmented examples. Compared with different NLP data augmentation approaches under a range of data imbalanced scenarios on four text classification benchmark, REPRINT shows prominent improvements. Moreover, through comprehensive ablation studies, we show that label refinement is better than label-preserving for augmented examples, and that our method suggests stable and consistent improvements in terms of suitable choices of principal components. Moreover, REPRINT is appealing for its easy-to-use since it contains only one hyperparameter determining the dimension of subspace and requires low computational resource.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Asia > China > Hubei Province > Wuhan (0.05)
- Europe > Belgium > Brussels-Capital Region > Brussels (0.04)
- (10 more...)
- Information Technology > Security & Privacy (0.67)
- Education (0.67)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Decompose, Enrich, and Extract! Schema-aware Event Extraction using LLMs
Shiri, Fatemeh, Nguyen, Van, Moghimifar, Farhad, Yoo, John, Haffari, Gholamreza, Li, Yuan-Fang
Large Language Models (LLMs) demonstrate significant capabilities in processing natural language data, promising efficient knowledge extraction from diverse textual sources to enhance situational awareness and support decision-making. However, concerns arise due to their susceptibility to hallucination, resulting in contextually inaccurate content. This work focuses on harnessing LLMs for automated Event Extraction, introducing a new method to address hallucination by decomposing the task into Event Detection and Event Argument Extraction. Moreover, the proposed method integrates dynamic schema-aware augmented retrieval examples into prompts tailored for each specific inquiry, thereby extending and adapting advanced prompting techniques such as Retrieval-Augmented Generation. Evaluation findings on prominent event extraction benchmarks and results from a synthesized benchmark illustrate the method's superior performance compared to baseline approaches.
- Oceania > Australia (0.15)
- Europe > Ukraine > Kyiv Oblast > Kyiv (0.04)
- Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
- (2 more...)
SequenceMatch: Revisiting the design of weak-strong augmentations for Semi-supervised learning
Semi-supervised learning (SSL) has become popular in recent years because it allows the training of a model using a large amount of unlabeled data. However, one issue that many SSL methods face is the confirmation bias, which occurs when the model is overfitted to the small labeled training dataset and produces overconfident, incorrect predictions. To address this issue, we propose SequenceMatch, an efficient SSL method that utilizes multiple data augmentations. The key element of SequenceMatch is the inclusion of a medium augmentation for unlabeled data. By taking advantage of different augmentations and the consistency constraints between each pair of augmented examples, SequenceMatch helps reduce the divergence between the prediction distribution of the model for weakly and strongly augmented examples. In addition, SequenceMatch defines two different consistency constraints for high and low-confidence predictions. As a result, SequenceMatch is more data-efficient than ReMixMatch, and more time-efficient than both ReMixMatch ($\times4$) and CoMatch ($\times2$) while having higher accuracy. Despite its simplicity, SequenceMatch consistently outperforms prior methods on standard benchmarks, such as CIFAR-10/100, SVHN, and STL-10. It also surpasses prior state-of-the-art methods by a large margin on large-scale datasets such as ImageNet, with a 38.46\% error rate. Code is available at https://github.com/beandkay/SequenceMatch.
- Europe > Russia (0.04)
- Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
- Europe > France (0.04)
- (2 more...)
- Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.35)
STA: Self-controlled Text Augmentation for Improving Text Classifications
Wang, Congcong, Pontiveros, Gonzalo Fiz, Derby, Steven, Wijaya, Tri Kurniawan
Despite recent advancements in Machine Learning, many tasks still involve working in low-data regimes which can make solving natural language problems difficult. Recently, a number of text augmentation techniques have emerged in the field of Natural Language Processing (NLP) which can enrich the training data with new examples, though they are not without their caveats. For instance, simple rule-based heuristic methods are effective, but lack variation in semantic content and syntactic structure with respect to the original text. On the other hand, more complex deep learning approaches can cause extreme shifts in the intrinsic meaning of the text and introduce unwanted noise into the training data. To more reliably control the quality of the augmented examples, we introduce a state-of-the-art approach for Self-Controlled Text Augmentation (STA). Our approach tightly controls the generation process by introducing a self-checking procedure to ensure that generated examples retain the semantic content of the original text. Experimental results on multiple benchmarking datasets demonstrate that STA substantially outperforms existing state-of-the-art techniques, whilst qualitative analysis reveals that the generated examples are both lexically diverse and semantically reliable.
- North America > United States > Maryland (0.04)
- Europe > Ireland (0.04)
- Africa > Mozambique (0.04)
- (11 more...)
On-the-fly Denoising for Data Augmentation in Natural Language Understanding
Fang, Tianqing, Zhou, Wenxuan, Liu, Fangyu, Zhang, Hongming, Song, Yangqiu, Chen, Muhao
Data Augmentation (DA) is frequently used to automatically provide additional training data without extra human annotation. However, data augmentation may introduce noisy data that impairs training. To guarantee the quality of augmented data, existing methods either assume no noise exists in the augmented data and adopt consistency training or use simple heuristics such as training loss and diversity constraints to filter out ``noisy'' data. However, those filtered examples may still contain useful information, and dropping them completely causes loss of supervision signals. In this paper, based on the assumption that the original dataset is cleaner than the augmented data, we propose an on-the-fly denoising technique for data augmentation that learns from soft augmented labels provided by an organic teacher model trained on the cleaner original data. A simple self-regularization module is applied to force the model prediction to be consistent across two distinct dropouts to further prevent overfitting on noisy labels. Our method can be applied to augmentation techniques in general and can consistently improve the performance on both text classification and question-answering tasks.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > Washington > King County > Seattle (0.04)
- (19 more...)
- Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.50)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Commonsense Reasoning (0.46)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Graph Rationalization with Environment-based Augmentations
Liu, Gang, Zhao, Tong, Xu, Jiaxin, Luo, Tengfei, Jiang, Meng
Rationale is defined as a subset of input features that best explains or supports the prediction by machine learning models. Rationale identification has improved the generalizability and interpretability of neural networks on vision and language data. In graph applications such as molecule and polymer property prediction, identifying representative subgraph structures named as graph rationales plays an essential role in the performance of graph neural networks. Existing graph pooling and/or distribution intervention methods suffer from lack of examples to learn to identify optimal graph rationales. In this work, we introduce a new augmentation operation called environment replacement that automatically creates virtual data examples to improve rationale identification. We propose an efficient framework that performs rationale-environment separation and representation learning on the real and augmented examples in latent spaces to avoid the high complexity of explicit graph decoding and encoding. Comparing against recent techniques, experiments on seven molecular and four polymer real datasets demonstrate the effectiveness and efficiency of the proposed augmentation-based graph rationalization framework.
Untapped Potential of Data Augmentation: A Domain Generalization Viewpoint
Piratla, Vihari, Shankar, Shiv
Data augmentation is a popular pre-processing trick to improve generalization accuracy. It is believed that by processing augmented inputs in tandem with the original ones, the model learns a more robust set of features which are shared between the original and augmented counterparts. However, we show that is not the case even for the best augmentation technique. In this work, we take a Domain Generalization viewpoint of augmentation based methods. This new perspective allowed for probing overfitting and delineating avenues for improvement. Our exploration with the state-of-art augmentation method provides evidence that the learned representations are not as robust even towards distortions used during training. This suggests evidence for the untapped potential of augmented examples.
- North America > United States > Massachusetts (0.04)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)